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(54) Title: COMBINATORIAL POLYPEPTIDE ANTIGENS 
(57) Abstract 

This invention pertains to a set of polypeptide antigens having amino acid sequences derived from amino ac ^ se ^ e ^ s 
of a ooDula ion of variants of a protein, or a portion thereof, and to methods of producing the set of polypeptide antigens. In gen- 
eral ^S^o^SsQ) selecting a protein, or a portion thereof, which exhibits a population of N variants, represented by 
the fo^^AiAX A n An , A n , where A n is an amino acid occurring at amino add position n of the protein, or portion 
thereof ™«nto of times O n " each type of amino acid occurs at each amino acid position n in the N 

variants' i) cliculat ml he frequency of occurrence (0 B »/N) n of each type of amino acid at each amino acid position n in 
the ^varian^ a set of polypeptide antigens having amino acid sequences represented substantia ly by 

!ne formula A'U' 2 A'3 . AV 2 AV|A'„, where A' n is defined as an amino acid type which occurs at greater than a selected 
frequency at the corresponding amino acid position in the N variants. 
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r-Q M^TMATOF TAT PQTYPF.PTTDF ANTIGENS 



Pnrkcr minH nf rhe Indention 

5 Host defense is a hallmark of vertebrate immune systems. To this end, antibodies 

perform numerous functions in the defense against pathogens. For instance, anubod.cs 
can neutralize a biologically active molecule, induce the complement pathway stimulate 
phagocytosis (opsonization), or participate in antibody-dependent cell-medmted 

10 cytotoxicity (ADCC). 

If the antibody binds to a site critical for the biological function of a molecule the 
activity of the molecule can be neutralized. In this way, specific antibodies can block 
the binding of a virus or a protozoan to the surface of a cell. Similarly, bacterial and 
other types of toxins can be bound and neutralized by appropriate antibodies . Moreover, 

15 regardless of whether a bound antibody neutralizes its target, the resulting antigen- 
antibody complex can interact with other defense mechanisms, resulting in destruction 

and/or clearance of the antigen. 

Parasites have evolved an array of mechanisms for avoiding an immune response. 
Antigenic variation is perhaps the most studied of the evasion strategies, in part because 
20 such variation makes vaccine development especially difficult. Generally, .here are two 
ways in which antigenic variation can occur: antigenic dnft and ant.gemc sh.fi. 
Antigenic drift is relatively straightforward. Point mutations arisen genes «°° d ' 
pathogen antigens, altering some of the epitopes on the 

Lnunologic memory to the original antigen is not tnggered by the mutant As 
* immunity fo one variant will no, neces^rUy ensure immunity to others accumulate of 
such point mutations in a pathogen population can resuU in multiple 
same host. Antigenic drift has been found in most pathogens (mc.ud.ng vtruses, 
bacteria, and protozoa), its importance varying among individual speaes. 

Many viruses are capable of great antigenic vanatton, and large numbers of 
» serologically distinct strains of these viruses have been identified. As a rasult, a 
particular strain of a virus becomes insusceptible to .mmuntty generated rate 
population by previous infection or vaccination. For instance, the progress of HIV-1 
Tine development has been impeded by the amino acid sequence vanabthty among 
different isolates of HTV-1. This variability is particularly high in fire exKma envelope 
« nrotein gpl20 which is the primary target for antibodies that neutrals vrus mfechvty 
S« a r 1986) EHAS £1:7023; Putney a aL (1986) 3^224:1392; and Rusche 
a ah (1987) ENAS 84:6924, incorporated by reference herein). Studies in humans and 
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mice have revealed a small region of gpl20, termed the V3 loop or principal neutralizing 
determinant (PND), comprising about 35 residues between two invariant, disulfide- 
crosslinked cysteines (Cys-303 to Cys-338: HIV-1 nomenclature of Takahashi el aL 
(1992) Science 255:333), that evokes the major neutralizing antibodies to the virus 

5 (Palker el aL ( 1 988) PNAS £5: 1 932; Rusche el aL ( 1 988) PNAS £1:3 198 and Goudsmit 
el aL (1988) PNAS £5:4478). While this same region is one of the most variable in 
sequence among different clonal isolates (Takahashi el aL (1992) Science 255:333), 
analysis of the amino acid sequences of this domain revealed conservation to better than 
80-percent of the amino acids in 9 out of 14 positions in the central portion of the V3 

10 loop, suggesting that there are constraints on the V3 loop variability (LaRosa el aL 
(1990) Science 242:932). However, because of this variability, neutralizing antibodies 
elicited by the PND from one isolate generally do not neutralize isolates with PND's of 
different amino acid sequence. 

Likewise, attempts to control influenza by vaccination has so far been of limited 

15 success and are hindered by continual changes in the major surface antigen of influenza 
viruses, the hemagglutinin (HA) and neuraminidase (NA), against which neutralizing 
antibodies are primarily directed (Caton el aL (1982) £ell 3.1:417; Cox el aL (1983) 
Bulletin W.H.O. £1:143; Eckert, E.A. (1973) J. Virology 11:183). The influenza viruses 
have the ability to undergo a high degree of antigenic variation within a short period of 

20 time. It is this property of the virus that has made it difficult to control the seasonal 
outbreaks of influenza throughout the human and animal populations. 

Through seralogic and sequencing studies, two types of antigenic variations have 
been demonstrated in influenza A viruses. Antigenic shift occurs primarily when either 
HA or NA, or both, are replaced in a new viral strain with a new antigenically novel HA 

25 or NA. The occurrence of new subtypes created by antigenic shift usually results in 

pandemics of infection. 

Antigenic drift occurs in influenza viruses of a given subtype. Amino acid and 
nucleotide sequence analysis suggests that antigenic drift occurs through a series of 
sequential mutations, resulting in amino acid changes in the polypeptide and differences 

30 in the antigenicity of the virus. The accumulation of several mutations via antigenic drift 
eventually results in a subtype able to evade the immune response of a wide number of 
subjects previously exposed to a similar subtype. In fact, similar new variants have been 
selected experimentally by passage of viruses in the presence of small amounts of 
antibodies in mice or chick embryos. Antigenic drift gives rise to less serious outbreak, 

35 or epidemics, of infection. Antigenic drift has also been observed in influenza B viruses. 

Purified antigen vaccines directed against the hepatitis B virus currently in 
clinical trials generally consists of antigens of a single viral subtype. The rational for 
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this decision has been that both S region and pre-S(2) region-specific antibodies, shown 
to be somewhat effective in neutralizing the hepatitis B virus, are primarily group- 
specific. However, this logic does not take into consideration the influence of viral 
subtype on T-cell recognition. It has been demonstrated that murine pre-S(2)-specific T- 

5 cell response is highly subtype-specific (Milic H. eiaL (1990) J, Immunol. 144:3535). 

The unicellular protozoon Plasmodium falciparum is the predominant pathogen 
causing malaria in humans. The infection starts when sporozoites, present in the salivary 
glands of Anopheles mosquitos, are inoculated into the blood of susceptible hosts. 
Sporozoites rapidly penetrate hepatocytes, in which they further develop into liver 

10 schizonts. After maturation, infectious merozoites are released into the blood of the host 
and invade erythrocytes, starting a new schizogonic cycle that is associated with the 
clinical symptoms of malaria. The number of malaria cases predicted by the World 
Health Organization is over 100 million worldwide. 

Limited success has been reported in the protection of monkeys against infection 

15 by certain species of Plasmodium by immunization with purified surface antigens 
expressed during at least one stage of the life cycle of the parasite. For instance, an 
attractive candidate for a blood-stage vaccine is the merozoite protein termed pi 90, or 
polymorphic schizont antigen (Herra et aL (1992) Infection and Immunity £0_:1 54-158; 
Merkli £t aL (1984) Nature 211:379-382; Mackay et aL (1985) EMBQ J. 4:3823-3829). 

20 pi 90 is a large glycoprotein which is synthesized and extensively processed during 
merozoite formation, the 80 kDa processing product of which is the major coat protein 
of mature merozoites. Monoclonal antibody probes against PI 90 and primary sequence 
analysis reveal that the antigen contains polymorphic sequences giving rise to antigenic 
variation among species and subspecies of Plasmodium. 

25 

Summar y nf rhp Invention 

This invention pertains to a set of polypeptide antigens having amino acid 
sequences derived from amino acid sequences of a population of variants of a protein, or 
30 a portion thereof, and to methods of producing the set of polypeptide antigens. In 
general, the method comprises: 

a. selecting a protein, or a portion thereof, which exhibits a population of N 
variants, represented by the formula 

Al A2 A3 .... A n _2 A n _i A n , 
35 where A n is an amino acid occurring at amino acid position n of the protein, 

or portion thereof; 

b. determining the number of times On* each W of amino acid OCC11 "' at 
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each amino acid position n in the N variants; 

calculating the frequency of occurrence (On a/N )n of each of amino 
acid at each amino acid position n in the N variants; and 
generating a set of polypeptide antigens having amino acid sequences 
represented substantially by the formula; and 

A'i A'2A , 3....A , n . 2 A , n .i A' n 
where A' n is defined as an amino acid type which occurs at greater than a 
selected frequency at the corresponding amino acid position in the N 
variants. 

10 In a preferred embodiment, the set of polypeptide antigens is generated by 

determining a degenerate oligonucleotide sequence having a minimum number of 
nucleotide combinations at each codon position n which combinations include at least 
the codons coding for each type of amino acid A'] to A* n . The degenerate 
oligonucleotide is incorporated into an expressible gene to create a gene set. The gene 

15 set is expressed in an appropriate expression system to generate the set of polypeptide 
antigens: 

A'i A 2 A3....A , n _2A , n -l A n 
Typically, n will range from 8 to about 150. The threshold frequency selected for 
inclusion of an amino acid in the set of polypeptide antigens at a given position can be 
20 set at the same value for all amino acid positions. Alternatively, the selected frequency 
threshold can be set individually for each amino acid position and can vary from position 
to position. Typically the threshold frequency will range from 5-15%. This means that 
for inclusion of a particular amino acid at a given position in the set of polypeptide 
antigens generated, the amino acid must appear in that position in more than 5-15% of 
25 the N variants (i.e., 1 N must be g^ter than 5- 1 5%). 

The set of polypeptide antigens can correspond to an entire peptide sequence of a 
protein, or to only a portion thereof. In addition, the variant sequence need not be 
contiguous in the protein; it can be dispersed within a single protein or protein subunit or 
in more than one protein or protein subunit. 
30 The set of polypeptides can be used as an artificial vaccine. The antigens can be 

administered to the host organism in a physiologically acceptable vehicle and under a 
dosage regimen sufficient to create protective immunity against a population of variants 
of the antigen. For instance, if the protein is a component of a pathogen, the protein or 
portion thereof may include sequences comprising one or more neutralizing epitopes 
35 such that the set of polypeptide antigens, when administered as an immunogen, results in 
the production of neutralizing antibodies to the pathogen by a host organism. 

Alternatively, the form as well as the route of administration of the set of 
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polypeptide antigens can be adjusted so as to be tolerogenic and thereby create artificial 
tolerance in a host organism to a variant protein antigen, or a portion thereof. 

PrrnilH Description of the Invention 

Either clinically evident or inapparent infection by a pathogen can lead to 
immunity, and the immunity to pathogens of the same antigenic structure appears to be 
long-lasting. However, reinfection by the pathogen can be caused by variants with 
minor antigenic differences. Further, as immunity is highly epitope-specific, artifically- 
induced immunity to a pathogen is often limited by marked antigenic variation of the 
pathogen. Therefore, the ability of a pathogen to undergo antigenic drift often results in 
the ineffectiveness of a conventional vaccine. 

This invention provides a method of generating a set of polypeptide antigens 
derived from a protein (or portion thereof) which is expressed with some degree of 
5 sequence heterogeneity among naturally or artificially induced variants of the protein. 
The purpose is to provide a mix of antigens which can be used to immunize against the 
variants and, preferably, possible unknown or new variants that may arise. 

According to the method of this invention, the frequency of occurrence for each 
amino acid type is determined for each amino acid position of the protein (or a portion of 
:0 the protein) in the population of variants. Amino acids which appear at each position 
above a predetermined frequency in the population of variants (e.g., 5%, 10%, 15%, etc.) 
are selected for inclusion to generate the set of polypeptide antigens. 

In general, polypeptides will range from 8 to 150 amino acids-, preferably from 
about 1 0 to 50 amino acids, in length. 
25 In the preferred embodiment, the set of polypeptide antigens is produced by way 

of a degenerate oligonucleotide. The sequence of the degenerate oligonucleotide can be 
determined so as to yield the minimum number of nucleotide combinations, for each 
codon necessary to give rise to each amino acid type selected for inclusion (i.e., those 
occuring at greater than a selected frequency at the corresponding ammo acid position n 

30 in the population of variants). 

The mixture of synthetic oligonucleotides can be somatically hgated into gene 
sequences such that the set of polypeptide antigens are expressible as individual 
polypeptides, or as a set of larger fusion proteins containing the set of polypeptide 
antigens therein. Alternatively, the set of polypeptide antigens can be generated by 

35 organo-chemical peptide synthesis. Each round of amino acid coupling can be carried 
out to yield a determined heterogeneity of amino acid types (by including more than one 
activated amino acid at appropriate steps in the synthesis) at a given amino acid position. 



;DOCID: <WO 94001 51 A 1.1. > 



WO 94/00151 



- 6 - 



PCT/US93/05899 



10 



15 



The amino acid mix is based on the frequency analysis of the N variants. Alternatively, 
the amino acid mix can be determined based upon the nucleotide (codon) combination 
generated in making the degenerate oligonucleotide. 

The combinatorial effect arising from the use of degenerate oligonucleotide 
sequences will typically give rise to some amino acid types which do not occur in the 
original population of variants. This provides for the generation of polypeptides, within 
the overall set of polypeptide antigens, which may not have arisen in nature. Because 
these nucleotide combinations are based initially on known variants, the codons which 
arise for additional amino acids potentially represent mutations more probable in nature, 
rather than those artificially created by a structural analysis of the protein. Thus, the set 
of polypeptide antigens can result in immunity to a wide range of potential variants as 
well as to known variants of the protein. 

To analyze the sequences of a population of variants of a protein, the amino acid 
sequences of interest can be aligned relative to sequence homology. The presence or 
absence of amino acids from an aligned sequence of a particular variant is relative to a 
chosen consensus length of a reference sequence, which can be real or artificial. In order 
to maintain the highest homology in alignment of sequences, deletions in the sequence 
of a variant relative to the reference sequence can be represented by an ammo acid space 
(*) while insertions mutations in the variant relative to the reference sequence can be 
disregarded and left out of the sequence of the variant when aligned. For instance 
demonstrated below are two possible alignments of three sequences of the V3 loop of 
HIV isolates of known tropism (Hwang si aL (1991) Science 211:71-76). 

The sequences, 

BaL -CTRPNNNrniKSIHIGPGRALYTTGEIIGDIRQAHC- (Seq ID No. 1 ) 

HTLV-IIIB -CTRPNNNTRKKIRIQRGPGRAFVTIGKIGNMRQAHC- (Seq ID No. 2) 
SF 162 -CTRPNNNTRKSITIGPGRAFYATGDIIGDIRQAHC- (Seq ID No. 3) 

30 can be aligned as: 

1 10 20 30 

| I I I 

BaL -CTRPNNNTRKSIHI**GPGRALYTTGEIIGDIRQAHC- (Seq ID No. 1) 

35 HTLV-IIIB -CTRPNNNTRKKIRIQRGPGRAFVTIGK* IGNMRQAHC- (Seq ID No. 2) 
SF162 -CTRPNNNTRKSITI**GPGRAFYATGDIIGDIRQAHC- (Seq ID No. 3) 
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in which residue 15 and 16 of the original HTLV-IIIB strain are included in the 
alignment, or alternatively as: 



30 



10 



20 



30 



BaL - CTRPNNNTRKS I H I GPGRALYTTGE I I GD I RQAHC - ( Seq ID No . 1) 

HTLV-IIIB - CTRPNNNTRKKIRIGPGRAFVTIGK* IGNMRQAHC - ( Seq ID No. 4) 

SF162 -CTRPNNNTRKSITIGPGRAFYATGDIIGDIRQAHC- (Seq ID No. 3) 

in which residues 15 and 16 of the original HTLV-IIIB strain are discarded from the 
alignment of the sequences. 

Given N variants of the protein, the number of times which a S iven amin ° 
acid (aa) occurs at a given position n, the frequency of occurrence for that amino acid at 
that position n is calculated by CC ' N . The frequency at which an amino acid deletion 
occurs at a given position can be factored into this calculation as well. 

Alternatively, if the deletions are not considered in the frequency calculation, then 
it may be desirable that the value of N used in the calculation at a given amino acid 
position n should be the number of variants less the number of variants m which an 
amino acid space is present at that given position. Thus, in the first example of 

alignment, 0? 6 /N=333 ^ 33%) md Ol 6 /N= - 667 < 67%) if the amin ° add SpaCe 13 

defined as an amino acid type, and 0? 6 1 N =1 -° (1 00%) if ft iS n0t 

Based upon the determination of the frequency of occurrence of amino acid types 
at each position n in the population of variants, a "threshold value" for inclusion of a 
particular amino acid type at the corresponding position n for the set of polypeptide 
antigens is determined. A degenerate oligonucleotide sequence can then be created. The 
degenerate oligonucleotide sequence is designed to have the minimum number of 
nucleotide combinations necessary, at each codon position, to give rise to codons for 
each amino acid type selected based upon the chosen threshold value. 

Thus, if the population of N variants is represented by the general formula, 

Ai A2 A3 .... A n _2 A n _i A n , 

where each variable A n represents an amino acid occurring at the n™ amino acid 
position of the protein, then a set of polypeptide antigens generated from the degenerate 
35 oligonucleotide sequence can be represented by the general formula, 
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A'l A'2 A3 .... A' n .2 A n -i A' n 

where each variable A n represents an amino acid type coded for by a possible 
nucleotide combination at the corresponding codon position n in the degenerate 

oligonucleotide sequence. . 

The threshold frequency used to select types of amino acids for mclusion in the 
set of polypeptide antigens and accordingly, for determining the degenerate 
oligonucleotide sequence, can be applied uniformly to each amino acid position. For 
instance, a threshold value of 15 percent can be applied across the entire protein 
, sequence. Alternatively, the threshold value can be set for each amino add position n 
independently. For example, the threshold value can be set at each amino acid position n 
so as to include the most commonly occurring amino acid types, e.g., those which appear 
at that position in at least 90% of the N variants. 

It may in some instances be desirable to apply a further cnterion to the 
5 determination of a degenerate oligonucleotide sequence which comprises restricting the 
degeneracy of a codon position such that no more than a given number of ammo acid 
types can arise at the corresponding amino acid position in the set of polypeptide 
antigens. For example, the degenerate sequence of a given codon position n can be 
restricted such that selected amino acids will occur in at least about 11 A of the 
■o polypeptides of the polypeptide antigen set. This means that all of the possible 
nucleotide combinations of that degenerate codon will give rise to no more than 9 
different amino acids at the position. Thus, the frequency at which a particular amino 
acid appears at a given position will depend on the possible degeneracy of the 
corresponding codon position. Preferably, the number will be 11.1 (9 «~ 
25 acids), 12.5 (8 different amino acids), 16.6 (6 different ammo acids), 25 (4 different 
amino acids) or 50 (2 different amino acids). 

Likewise, criteria used for choosing the population of variants for frequency 
analysis can be determined by such factors as the expected utility of the polypeptide 
antigen set and factors concerning vaccination or tolerization. For example, analysis of a 
30 variant protein sequence can be restricted to subpopulations of a larger population of 
" variants of the protein based on factors such as epidemiological data, including 
geographic occurrence or alternatively, on known allele families (such as variants of the 
DoTlLA class II allele). Likewise, in the case of protein components of pathogens the 
population of variants selected for analysis can be chosen based on known tropisms for a 
35 particular susceptible host organism. 

There are many ways by which the set of polypeptide antigens can be generated 
from the degenerate oligonucleotide sequence. Chemical synthesis of a degenerate 
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oligonucleotide can be carried out in an automatic DNA synthesizer, and the synthetic 
oligonucleotides can then be ligated into an appropriate gene for expression. A start 
codon (ATG) can be engineered into the sequence if desired. The degenerate 
oligonucleotide sequences can be incorporated into a gene construct so as to allow 
expression of a protein consisting essentially of the set of polypeptide antigens. 
Alternatively, the set of polypeptide antigens can be expressed as parts of fusion 
proteins. The gene library created can be brought under appropriate transcriptional 
control by manipulation of transcriptional regulatory sequences. It may be desirable to 
create fusion proteins containing a leader sequence which directs transport of the 
recombinant proteins along appropriate cellular secretory routes. 

Various methods of chemically synthesizing polydeoxynucleotides are known, 
including solid-phase synthesis which, like peptide synthesis, has been fully automated 
in commercially available DNA synthesizers (See the Itakura si aL U.S. Patent No 
4,598,049; the Caruthers si aL U.S. Patent No 4,458,066; and the Itakura U.S. Patent 
Nos 4 401,796 and 4,373,071, incorporated by reference herein). 

'The purpose of a degenerate set of oligonucleotides is to provide, in one mixture, 
all of the sequences encoding the desired set of polypeptide antigens. It will generally 
not be practical to synthesize each oligonucleotide of this mixture one by one, 
particularly in the case of great numbers of possible variants. In these instances, the 
mixture can be synthesized by a strategy in which a mixture of coupling units 
(nucleotide monomers) are added at the appropriate positions in the sequence such that 
the final oligonucleotide mixture includes the sequences coding for the desired set ot 
polypeptide antigens. Conventional techniques of DNA synthesis take advantage of 
protecting groups on the reactive deoxy nucleotides such that, upon incorporation into a 
growing oligomer, further coupling to that oligomer is inhibited until a subsequent 
deprotecting step is provided. Thus, to create a degenerate sequence, more than one type 
of deoxynucleotide can be simultaneously reacted with the growing oligonucleotide 
during a round of coupling, either by premixing nucleotides or by programming the 
synthesizer to deliver appropriate volumes of nucleotide-containing reactant solutions. 
For each codon position corresponding to an amino acid position having only one amino 
acid ivne in the eventual set of polypeptide antigens, each oligonucleotide of the 
degenerate set of oligonucleotides will have an identical nucleotide sequence. At a 
codon position corresponding to an amino acid position at which more than one ammo 
acid type will occur in the eventual set, the degenerate set of oligonucleotides will 
comprise nucleotide sequences giving rise to codons which code for those amino acid 
types at that position in the set. In some instances, due to other combinations that the 
degenerate nucleotide sequence can have, the resulting oligonucleotides will have 
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codons directed to amino acid types other than those designed to be present based on 
analysis of the frequency of occurrence in the variant. The synthesis of degenerate 
oligonucleotides is well known in the art (see for example Narang, SA (1983) 
Tetrahedron 22:3; Itakura el aL (1981) in Rsgomhinani PNA Proc ^rd Cleveland 
<j yr vr n ? Marrnmolecules . ed. AG Walton, Amsterdam: Elsevier pp273-289; Itakura ej 
31 (1984) Ann,, T?pv Rinchem. 53:323; Itakura el aL (1984) Science mi056; Ike el 
aL (1983) Nnr.1e.ic Acid Res. 11:477, incorporated by reference herein). 

To further illustrate this technique, as is well known in the art, genes that code for 
proteins specify amino acid sequence by the order of deoxyribonucleotides in the DNA, 
but more directly by the sequence of ribonucleotides in their mRNA transcript,. An 
important feature of the genetic code is that all but two amino acids are encoded by more 
than one nucleotide triplet (codon). The genetic code (in terms of deoxyribonucleotides) 
can be depicted as follows. 



15 



20 



25 



30 



35 



First 
Position 
(5') 



TABL E 1 

Second 
Position 



Leu Pro His Arg 

Leu Pro His Arg 

Leu Pro Gin Arg 

Leu Pro Gin Arg 



He Thr Asn Ser 

He Thr Asn Ser 

He Thr Lys Arg 

Met Thr Lys Arg 



Third 
Position 
(3' ) 



T 


C 


A 


G 




Phe 


Ser 


Tyr 


Cys 


T 


Phe 


Ser 


Tyr 


Cys 


C 


T Leu 


Ser 


Stp 


Stp 


A 


Leu 


Ser 


Stp 


Trp 


G 



40 



T 

C 
A 
G 

T 
C 
A 
G 



Val Ala Asp Gly T 

Val Ala Asp Gly C 

Val Ala Glu Gly A 

Val Ala Glu Gly G 
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*Stp = stop codon 

As noted above, one strategy of synthesizing the degenerate oligonucleotide 
involves simultaneously reacting more than one type of deoxynucleotide during a given 
round of coupling. For instance, if either a Histidine (His) or Threonine (Thr) was to 
appear at a given amino acid position, the synthesis of the set of oligonucleotides could 
be carried out as follows: (assuming synthesis were proceeding 3' to 5') the growing 
oligonucleotide would first be coupled to a 5'- P rotected thymidine deoxynucleotide, 
deprotected, then simultaneously reacted with a mixture of a 5'-protected adenine 
deoxynucleotide and a S'-protected cytidine deoxynucleotide. Upon deprotection of the 
resulting oligonucleotides, another mixture of a 5'-protected adenine deoxynucleotide 
and a 5'-protected cytidine deoxynucleotide are simultaneously reacted. The resulting 
set of oligonucleotides will contain at that codon position either ACT (Thr), AAT (Asn), 
CAT (His) or CCT (Pro). Thus, when more than one nucleotide of a codon is varied, the 
use of nucleotide monomers in the synthesis can potentially result in a mixture of codons 
including, but not limited to, those designed to be present by frequency analysis. 

Table 1 can be employed to calculate degenerate nucleotide sequences having 
possible combinations corresponding to codons for given amino acid types such that at 
degenerate codon positions, the number of amino acid types beyond those selected by 
frequency analysis is minimized. For use in designating degenerate oligonucleotide 
sequences, the following IUPAC symbols and meanings are provided: 

TABLED 



35 



40 



Symbol 


TorrespondSiQ 


A 


A:adenine 


c 


C:cytosine 


G 


G:quanine 


T 


T.thymine 


M 


AorC 


R 


A or G 


W 


AorT 


S 


CorG 


Y 


CorT 


K 


GorT 


V 


A or CorG; not T 


H 


A or G or T; not G 
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D A or G or T; not C 

B C or G or T; not A 

N A or C or G or T or unknown 

To create an amino acid space (deletion) at a given amino acid position, a portion 
of the oligonucleotide mixture can be held aside during the appropriate rounds of 
nucleotide additions (i.e., three coupling rdunds per codon) so as to lack a particular 
codon position all together, then added back to the mixture at the start of synthesis of the 

10 subsequent codon position. 

The entire coding sequence for the polypeptide antigen set can be synthesized by 
this method. In some instances, it may be desirable to synthesize degenerate 
oligonucleotide fragments by this method, which are then ligated to invariant DNA 
sequences synthesized separately to create a longer degenerate oligonucleotide. 

Likewise, the amino acid positions containing more than one amino acid type in 
the generated set of polypeptide antigens need not be contiguous in the polypeptide 
sequence. In some instances, it may be desirable to synthesize a number of degenerate 
oligonucleotide fragments, each fragment corresponding to a distinct fragment of the 
coding sequence for the set of polypeptide antigens. Each degenerate oligonucleotide 
fragment can then be enzymatically ligated to the appropriate invariant DNA sequences 
coding for stretches of amino acids for which only one amino acid type occurs at each 
position in the set of polypeptide antigens. Thus, the final degenerate coding sequence is 
created by fusion of both degenerate and invariant sequences. 

These methods are useful when the frequency -based mutations are concentrated 
in portions of the polypeptide antigen to be generated and it is desirable to synthesis long 
invariant nucleoside sequences sepa:,tely from the synthesis of degenerate nucleotide 

sequences. . 

Furthermore, the degenerate oligonucleotide can be synthesized as degenerate 
fragments and ligated together (i.e., complementary overhangs can be created, or blunt- 
end ligation can be used). It is common to synthesize overlapping fragments as 
complementary strands, then annealing and filling in the remaining single-stranded 
regions of each strand. It will generally be desirable in instances requiring annealing of 
complementary strands that the junction be in an area of little degeneracy. 

The nucleotide sequences derived from the synthesis of a degenerate 
oligonucleotide sequence and encoding the set of polypeptide antigens can be used to 
produce the set of polypeptide antigens via microbial processes. Ligating the sequences 
into a gene construct, such as an expression vector, and transforming or transacting into 
hosts, either eukaryotic (yeast, avian or mammalian) or rrokaryotic (bactenal cells), are 



35 



BNSDOCID: <WO 94001 51 A1_l_> 



WO 94/00151 



- 13 - 



PC17US93/05899 



10 



15 



20 



standard procedures used in producing other well-known proteins, e.g. insulin, 
interferons, human growth hormone, IL-1, IL-2, and the like. Similar procedures or 
obvious modifications thereof, can be employed to prepare the set of polypeptide 
antigens by microbial means or tissue-culture technology in accord with the subject 

invention. ~ 
As stated above, the degenerate set of oligonucleotides coding for the set of 
polypetide antigens in die form of a library of gene constructs can be ligated into a 
vector suitable for expression in either prokaryotic cells, eukaryotic cells, or both. 
Expression vehicles for production of the set of polypeptide antigens of this invention 
include plasmids or other vectors. For instance, suitable vectors for the expression of die 
degenerate set of oligonucleotides include plasmids of the types: P BR322, pEMBL 
plasmids, pEX plasmids, pBTac plasmids and pUC plasmids for expression in 

prokaryotic cells, such as eoJi- 

A number of vectors exist for the expression of recombinant proteins m yeast. 
For instance, YEP24, Y1P5, YEP51, YEP52 and YRP17 are cloning and expression 
vehicles useful in the introduction of genetic constructs into S. cerevisiae (see for 
example Broach sL aL (1983) in Fxpr rimental Manipulation of Gene Expression , ed M. 
Inouye Academic Press, p. 83, incorporated by reference herein). These vectors can 
replicate in IL £Qli due the presence of the P BR322 ori, and in S. cerevisiae due to the 
replication determinant of the yeast 2 micron plasmid. In addition, drug resistance 
markers such as ampicillin can be used. 

The preferred mammalian expression vectors contain both prokaryotic sequences 
to facilitate the propagation of the vector in bacteria, and one or more eukaryotic 
transcription units that are expressed in eukaryotic cells. The P SV2gpt, P SV2neo, P SV2- 
dhfr, P Tk2, pRSVneo, P MSG, P SVT7, pko-neo and pHyg derived vectors are examples 
of mammalian expression vectors suitable for transfection of eukaryotic cells^ These 
vectors are modified with sequences from bacterial plasmids such as pBR322 to 
facilitate replication and drug resistance selection in both prokaryotic and eukaryotic 
cells Alternatively, derivatives of viruses such as the bovine papilloma virus (BPV-1), 
Epstein-Barr virus (pHEBo and P 205) can be used for transient expression of proteins in 
eukaryotic eel's. The various methods employed in the preparation of the plasmids and 
transformation of host organisms are well known in the art. For other suitable 
expression systems for both prokaryotic and eukaryotic, as well as general recombinant 
procedures, see MalSSlto Ckning, 2nd Ed., ed. by Sambrook, Fritsch and Maniatis 
> (Cold Spring Harbor Laboratory Press.1989) incorporated by reference herein. 

To express the library of gene constructs of the degenerate set of 
oligonucleotides, ?t mav be desirable to include transcriptional and translations! 
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regulatory elements and other non-coding sequences to the expression construct. For 
instance, regulatory elements including constitutive and inducible promoters and 
enhancers can be incorporated. /A „_. . 

In some instances, it will be necessary to add a start codon (ATG) to the 
degenerate oligonucleotide sequence. It is well known in the art that a methionine at the 
N-terminal position can be enzymatically cleaved by the use of the enzyme methionine 
aminopeptidase (MAP). MAP has been cloned from E- fidi (Ben-Bassat si aL (1987) L 
BaclerioL 162:751-757) and Salmonella tyjtouriiim and its in ^ 
demonstrated on recombinant proteins (Miller fit aL (1987) ENAS 84:2718-1722). 
Therefore, removal of an N-terminal methionine if desired can be achieved either m Yiy^ 
by expressing the set of polypeptide antigens in a host which produces MAP (e.g., E- 
COli or CM89 or S. Cerevisiae), or in vitm by use of purified MPA (e.g., procedure of 

Miller fit aL). , 

Alternatively, the coding sequences for the polypeptide antigens can be 
incorporated as a part of a fusion gene including an endogenous protein for expression 
by the microorganism. For example, the VP6 capsid protein of rotavirus can be used as 
an immunologic carrier protein for the polypeptide antigen set, either in the monomenc 
form or in the form of a viral particle. The set of degenerate oligonucleotide sequences 
can be incorporated into a fusion gene construct which includes coding sequences for a 
late vaccinia virus structural protein to produce a set of recombinant viruses expressing 
fusion proteins comprising the set of polypeptide antigens as part of the virion. It has 
been demonstrated with the use of V-3 loop/Hepatitis B surface antigen fusion proteins 
that recombinant Hepatitis B virions can be utilized in this role as well. Similarly, 
chimeric constructs coding for fusion proteins containing the set of polypeptide antigens 
and the poliovirus capsid protein can be created to enhance immunogenecity of the set of 
polypeptide antigens. The use of such fusion protein expression systems to establish a 
set of polypeptide antigens has the advantage that often both B-cell proliferation in 
response to the immunogen can be elicited, (see for example EP Publication . No. 
0259149; and Evans fit aL (1989) Nature 222:385; Huang el aL (1988) UvjroL 62:3855; 
and Schlienger fit aL (1992) J- Virol- 66:2, incorporated by reference herein). The 
Multiple Antigen Peptide (MAP) system for peptide-based vaccines can be utilized in 
which the polypeptide antigen set is obtained directly from organo-chemical synthesis of 
the peptides onto an oligomeric branching lysine core (see for example Posnett si aL 
(1988) ffiC 262:1719 and Nardelli fit aL (1992) IJmimmQL 14*914, incorporated by 
reference herein). Foreign antigenic determinants can also be expressed and presented 

by bacterial cells. . . . f 

' Techniques for making fusion genes are well known. Essentially, the joining of 
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various DNA fragments coding for different polypeptide sequences is performed m 
accordance with conventional techniques, employing blunt-ended or stagger-ended 
termini for ligation, restriction enzyme digestion to provide for appropriate termini 
fillin8-in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid 
undesirable joining, and enzymatic ligation. Alternatively, the fusion gene can be 
synthesized by conventional techniques including automated DNA synthesizers. 

An alternative approach to generating the set of polypeptide antigens is to carry 
out the peptide synthesis directly. At each codon position n in the degenerate 
oligonucleotide, each possible nucleotide combination can be determined and the 
corresponding amino acid designated for inclusion at the corresponding ammo acid 
position of the polypeptide antigen set. Thus, synthesis of a degenerate polypeptide 
sequence can be directed in which sequence divergence occurs at those amino acid 
positions at which more than one amino acid is coded for in the corresponding codon 
position of the degenerate oligonucleotide. Organo-chemical synthesis of polypeptides 
is well known and can be carried out by procedures such as solid state peptide synthesis 
using automated protein synthesizers. 

The synthesis of polypeptides is generally carried out through the condensation of 
the carboxy group of an amino acid, and the amino group of another amino acid, to form 
a peptide bond. A sequence can be constructed by repeating the condensation of 
individual amino acid residues in stepwise elongation, in a manner analogous to the 
synthesis of oligonucleotides. In such condensations, the amino and carboxy groups that 
are not to participate in the reaction can be blocked with protecting groups which are 
readily introduced, stable to the condensation reactions and selectively removable from 
the completed peptide. Thus, the overall process generally comprises protection, 
activation, coupling and deprotection. If a peptide involves amino acids with side chains 
that may react during condensation, the side chains can also be reversibly protected, 
removable at the final stage of synthesis. 

A successful synthesis for a large polypeptide by a linear strategy must achieve 
nearly quantitative recoveries for each chemical step. Many automated peptide synthesis 
schemes take advantage of attachment of the growing polypeptide chain to an insoluble 
polymer resin support such that the polypeptide can be washed free of IqpM^ 
excess reactants after each reaction step (see for example Merrifield (1963) LAXL2L 
^:2149; Chang * aL (1978) InU^M^^ £246; Barany and Merr^ 
TVip Pe ptide., vol 2 ©1979 NY:Academic Press, ppl-284; Tarn, JJ (1988) ENAS 
SJV5409; and Tarn el aL U.S. Patent No. 4,507,230, incorporated by reference herein). 
For example, a first amino acid is attached to a resin by a cleavable linkage to its 
carboxylic group, deblocked at its amino acid side, and coupled with a second activated 



JSDOCID: <WO 9400151A1J_> 



WO 94/00151 



- 16 - 



PCT/US93/05899 



10 



15 



20 



25 



30 



35 



amino acid carrying a protected a-amino group. The resulting protected dipeptide is 
deblocked to yield a free amino terminus, and coupled to a third N-protected ammo acid. 
After many repetitions of these steps, the complete polypeptide is cleaved from the resin 
support and appropriately deprotected. 

To generate the set of polypeptide antigens, more than one N-protected ammo 
acid type can be reacted simultaneously in each round of coupling with the growing 
polypeptide chain to create the desired degenerate amino acid sequence at each amino 
acid position. In one embodiment, the set of polypeptides will include only those amino 
acids that are present at any position n in the population of variants above the 
predetermined threshold frequency. Alternatively, one can first design the degenerate 
oligonucleotide, determine the amino acids encoded by the combination of codons and 
include all the amino acids in the chemical synthesis. For example, a degenerate codon 
at codon position n, having the sequence MMT and thus coding for either a Thr (ACT) 
an Asn (AAT), a His (CAT) or a Pro (CCT), can be created at the peptide synthesis level 
by reacting all four N-protected amino acid types simultaneously with the free amino 
terminus of the growing, resin-bound peptide. Thus, four subpopulations of peptides 
will be created, each subpopulation definable by the amino acid type present at the 
amino acid position n corresponding to the codon position n. 

Because the amino acid being added to the resin-bound polypeptide is protected, 
the growth of the peptide chain is terminated upon addition of the protected amino acid 
until the subsequent deblocking step. Those skilled in the art will recognize that, due to 
potential differences in reactivity of various amino acid analogs, it may be desirable to 
use non-equimolar ratios of amino acid types when simultaneously reacting more than 
one amino acid type in order to get equimolar ratios of subpopulations. Alternatively it 
may be desirable to divide the resin-bound polypeptide into aliquots, each of which is 
reacted with a distinct amino acid type, the polypeptide products being recombined prior 
to the next coupling reaction. This technique can be applied to create an amino add gap 
in a subpopulation, simply by holding aside an appropriate aliquot during one round of 
coupling then recombining all resin-bound polypeptides prior to the next round of 
coupling. Furthermore, it is apparent that, from the many different blocking and 
activating groups available, chemical synthesis of the polypeptide can be earned out in 
either the N-terminal to C-terminal, or C-terminal to N-terminal direction. 

The generated set of polypeptide antigens can be covalently or noncovalently 
modified with non-proteinaceous materials such as lipids or carbohydrates to enhance 
immunogenecity or solubility. The present invention is understood to include all such 
chemical modifications of the set of polypeptide antigens so long as the modified peptide 
antigens retain substantially all the antigenic/immunogenic properties of the parent 
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The generated set of polypeptide antigens can also be coupled with or 
incorporated into a viral particle, a replicating virus, or other microorganism in order to 
enhance immunogenicity. The set of polypeptide antigens may be chemically attached 
to the viral particle or microorganism or an immunogenic portion thereof. 

There are a large number of chemical cross-linking agents that are known to those 
skilled in the art. For the present invention, the preferred cross-linking agems are 
heterobifunctional cross-linkers, which can be used to link proteins m a stepwise 
manner. Heterobifunctional cross-linkers provide the ability to design more specific 
coupling methods for conjugating proteins, thereby reducing the occurrences of 
unwanted side reactions such as homo-protein polymers. A wide variety of 
heterobifunctional cross-linkers are known in the art. These include: succmimidy 4- 
(N-maleimidomethyl) cyclohexane- 1 -carboxylate (SMCC), m-Maleimidobenzoyl-N- 
hydroxysuccinimide ester (MBS); N-succinimidyl (4-iodoacetyl) aminobenzoate 
(SIAB), succinimidyl 4-(p-maleimidophenyl) butyrate (SMPB), l-ethyl-3-(3- 
dimethylaminopropyl) carbodiimide hydrochloride (EDC); 4-succinimidyloxycarbo^ 
a-methyl-a-(2-pyridyldithio>tolune (SMPT), N-succinimidyl -(2-pyndyldithio) 
propionate (SPDP), succinimidyl 6-[3-(2-pyridyldithio) propionate] hexanoate (LC- 
SPDP) Those cross-linking agents having N-hydroxysuccinimide moieties can be 
obtained as the N-hydroxysulfosuccinimide analogs, which generally have greater water 
solubility. In addition, those cross-linking agents having disulfide bridges within the 
linking chain can be synthesized instead as the alkyl derivatives so as to reduce the 
amount of linker cleavage in YivQ. 

The introduction of antigen into an animal initiates a series of events culminating 
in both cellular and humoral immunity. By convention, the property of a molecule that 
allows it to induce an immune response is called immunogenicity. The property of being 
able to react with an antibody that has been induced is called antigenicity. Antibodies 
able to cross-react with two or more different antigens can do so by virtue of some 
degree of structural and chemical similarity between the antigenic determinants (or 
"epitopes") of the antigens. A protein immunogen is usually composed of a number of 
antigenic determinants Hence, immunizing with a protein results in the formation of 
antibody molecules with different specificities, the number of different antibodies 
depending on the number of antigenic determinants and their inherent immunogenicity. 

Proteins are highly immunogenic when injected into an animal for whom they are 
not normal ("self) constituents. Conversely, peptides and other compounds with 
molecular weights below about 5000 (termed "haptens") daltons, by themselves, do not 
generally elicit the formation of antibodies. However, if these small molecule antigens 
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are first coupled with a longer immunogenic antigen such as a protein, antibodies can be 
raised which specifically bind epitopes on the small molecules. Conjugation of haptens 
to carrier proteins can be carried out as described above. 

When necessary, modification of such ligand to prepare an immunogen should 

5 take into account the effect on the structural specificity of the antibody. That is, in 
choosing a site on a ligand for conjugation to a carrier such as protein, the selected site is 
chosen so that administration of the resulting immunogen will provide anybodies which 
will recognize the original ligand. Furthermore, not only must the antibody recognize 
the original ligand, but significant characteristics of the ligand portion of the immunogen 

10 must remain so that the antibody produced after administration of the immunogen will 
more likely distinguish compounds closely related to the ligand which may also be 
present in the patient sample. In addition, the antibodies should have high binding 
constants. 

Vaccines comprising the generated set of polypetide antigens, and variants 
is thereof having antigenic properties, can be prepared by procedures well known in the art. 
For example, such vaccines can be prepared as injectables, e.g., liquid solutions or 
suspensions. Solid forms for solution in, or suspension in, a liquid prior to injection also 
can be prepared. Optionally, the preparation also can be emulsified. The active 
antigenic- ingredient or ingredients can be mixed with excipients which are 
20 pharmaceutically acceptable and compatible with the active ingredient. Examples of 
suitable excipients are water, saline, dextrose, glycerol, ethanol, or the like, and 
combinations thereof. In addition, if desired, the vaccine can contain minor amounts of 
auxiliary substances such as wetting or emulsifying agents, pH buffering agents, or 
adjuvants such as aluminum hydroxide or muramyl dipeptide or variations thereof. In 
25 the case of peptides, coupling to larger molecules such as Keyhole limpet hemacyanin 
(KLH) sometimes enhances immunogenicity. The vaccines are conventionally 
administered parenteral^, by injection, for example, either subcutaneously or 
intramuscularly. Additional formulations which are suitable for other modes ot 
administration include suppositories and, in some cases, oral formulations. For 
30 suppositories; the traditional binders and carriers include, for example, polyalkalene 
glycols or triglycerides. Suppositories can be formed from mixtures containing the 
active ingredient in the range of about 0.5% to about 10%, preferably about 1% to about 
2% Oral formulations can include such normally employed excipients as, for example, 
pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium 
35 saccharine, cellulose, magnesium carbonate, and the like. These compositions can take 
the form of solutions, suspensions, tablets, pills, capsules, sustained release formulations 
or powders and contain from about 10% to about 95% of active ingredient ^eferably 
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from about 25% to about 70%. 

The active compounds can be formulated into the vaccine as neutral or salt forms. 
Pharmaceutically acceptable salts include the acid addition salts (formed with the free 
amino groups of the polypeptides) and which are formed with inorganic acids such as, 
5 for example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, 
tartaric mandelic, and the like. Salts formed with the free carboxyl groups can also be 
derived from inorganic bases such as, for example, sodium, potassium, ammonium, 
calcium, or ferric hydroxides, and such organic bases as isopropylamine, 
trimethylamine, 2-ethylamino ethanol, histidine, procaine, and the like. A vaccine 
io composition may include peptides containing T helper cell epitopes in combination with 
protein fragments containing the principal neutralizing domain. For instance, several of 
these epitopes have been mapped within the HIV envelope, and these regions have been 
shown to stimulate proliferation and lymphokine release from lymphocytes. Providing 
both of these epitopes in a vaccine comprising a generated set of polypeptide antigens 
15 derived from analysis of fflV-1 isolates can result in the stimulation of both the humoral 
and the cellular immune responses. In addition, commercial carriers and adjuvants are 
available to enhance immunomodulation of both B-cell and T-cell populations for an 
immunogen (for example, the Imject Supercarrier™ System, Pierce Chemical, Catalog 
No. 77151G). 

20 Alternatively, a vaccine composition may include a compound which functions to 

increase the general immune response. One such compound is interleukin-2 (IL-2j 
which has been reported to enhance immunogenicity by general immune stimulation 
(Nunberg fit aL (1988) >w Chemical and Genetic Approaches t o Vaccination , Cold 
Spring Harbor Laboratory, Cold Spring Harbor, NY). IL-2 may be coupled the 

25 polypeptides of the generated set of polypeptide antigens to enhance the efficacy of 
vaccination. 

The vaccines are administered in a manner compatible with the dosage 
formulation, and in such amount as will be therapeutically effective and immunogenic. 
The quantity to be administered depends on the subject to be treated, capacity of the 
30 subjects immune system to synthesize antibodies, and the degree of protection desired 
Precise amounts of active ingredient required to be administered depend on the judgmen 
of the practitioner and are peculiar to each individual. Suitable regimes for initia 
administration and booster shots are also variable, but are typified by an initial 
administration followed in one or two week intervals by a subsequent injection or other 

35 administration. . 

Antigens that induce tolerance are called toleragens, to be distinguished from 
immunogens, which generate immunity. Exposure of an individual to immunogenic 
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antigens stimulates specific immunity, and for most immunogenic proteins, subsequent 
exposures generate enhanced secondary responses. In contrast, exposure to a toleragenic 
antigen not only fails to induce specific immunity, but also inhibits lymphocyte 
activation by subsequent administration of immunogenic forms of the same antigen. 
Many foreign antigens can be immunogens or toleragens, depending on the 
physicochemical form, dose, and route of administration. This ability to manipulate 
responses to antigens can be exploited clinically to augment or suppress specific 
immunity. For instance, it can be desirable in the context of organ transplant. technology- 
to tolerize a transplant recipient with a polypeptide antigen set derived from the 
frequency analysis of known sub-haplotypes of a class II peptide (i.e., such as the DQ or 
DR allele products) present on the transplanted tissue in order to minimize rejection. It 
is also within the equivalence of this invention that the set of polypeptide antigens can be 
chemicallv coupled or incorporated as part of a fusion protein with an apoptotic agent 
for instance an agent which brings about deregulation of C-myc expression or a cell 
toxin such as diptheria toxoid, such that programmed cell death is brought about in an 

antigen specific manner. . 

Thus it would be routine for one skilled in the art to determine the appropriate 
administration regimen necessary to induce tolerance to the set of polypeptide antigens 

of the present invention. 

The following example serves to further illustrate the present invention. 
Human Immunodeficiency Virus type I (fflV-1), the causative agent of Acquired 
Immunodeficiency Syndrome (AIDS), shows very marked sequence diversity between 
different isolates. The outer envelope glycoprotein gpl20 of HIV-1, which facilitates 
binding of the virion to CD4, has been shown to be the major target of neutralizing 
antibodies. Studies in humans and mice have revealed a small region of this protein, 
termed the V3 loop, between cysteine residues 303 and 338, that evokes the major 
neutralizing antibodies to the virus. Recombinant or synthetic polypeptides containing 
V3 loop epitopes of an HIV-1 isolate have been shown to induce relatively high titers of 
strain-specific neutralizing antibodies. However, even single amino acid substitutions 
within the V3 loop are sufficient in some cases to greatly reduce antibody binding, in 
agreement with the strict specificity of neutralizing antibodies to the V3 loop. 

Table 3 sets forth the results obtained from frequency analysis of V3-loop 
seouences of a population of HIV-1 isolates obtained largely from AIDS patients m 
ILL (Volinsky * aL (1992) 22*1134; LaRosa * * (1990) *«» 

242-93T and Holley el al (1991) EHAS 88*800, incorporated by reference herein) 
The alignment of each variant sequence is relative to the reference sequence: 
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CTRPNNN**TRKSIHI * *GPGRAFY*TTGEIIGDIRQAHC (Seq ID No . 5) 

5 where cys-1 of the reference sequence corresponds to cys-303 of gpl20 according to the 
nomenclature used herein. 

TABLE 3 

10 Fre quency Analysis of N A Isolates of HIV-1 



V3 Position Frequency of occurrence 

(expressed in percent) 

1 C = 100 

2 T = 86.3;l = 11.3; A = <1;L = <1; 

M = <1;P = <1;S = <1 

3 R = 99.4;I = <1;K = <1 

4 p = 97.3; H= 1.5; L = <1;S = <1 

5 N = 86.3; S = 7.0; Y = 3.0; G = 1.8; 

D = <1;H = <1 

6 N = 90.9; S = 3.4; D= 1.5; Y= 1.2; 

G = <1;I = <1;K = <1;T = <1 

7 N = 92.4; T = 3.0; Y= 1.2; D = <1; 

H = <1;I = <1;K=<1;R = <1 

g * = 98.8; I = <2; K = <1 

9 * = 99.7;K = <1 

10 T = 92.0; I = 1.8; V= 1.8; A =1.5; 

K = <1;R = <1 



20 



25 



30 



35 



40 



11 



R=88.9;K = 6.3;I= 1.8;E = <1; 
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G = <i;M = <i;P = <i;Q = <i;T = <i 

K = 76.5; R = 17.4; Q- 2.8; N = 2.1; 
H = <2;E = <1;G = <1 



10 
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13 
14 
15 

16 

17 
18 
19 

20 

21 
22 

23 

24 

25 



S = 64.8; G = 21 .7; R = 10.6; * = <1; 
A = <1;;H = <1;K = <1 

I = 96.1;L=1.2;E = <1;F = <1; 
M = <1;T = <1; V = <1 



H = 49.2; N 
T= 8.4; Y 
A = <1; G 

I = 72.4; M = 
F= 1.2; V = 
R=<1; Y = 



= 9.6;P = 9.0;R = 9.0; 
= 6.2;S = 5.6;F= 1.5; 
= <1;K = <1; V = <1 

18.7; L = 2.0; T = 1.7; 
1.2;K = <1;S = <1; 
<1 



* = 97.3; Q = 2.5; R = <1 
*= 97.3; R = 2.5; G = <1 



G - 90.6; M 

l = <i;;R 



= 7.4; A = <1;E = <1; 
= <1;T = <1 



P = 88.9; G = 
Q = <1;S = 



7.9;L=1.2;A = <1; 
<1 



G = 99.0;E = <1;R = <1 



R=80.1;K = 
* = <1;G = 



= 11.9; S = 3.7; Q = 2.5; 
<1;M = <1 



A = 86.0; V = 4.4; T = 4.2; K = 1.5; 
R=-1.5;N= 1.2;P = <1;S = <1;W = <1 

F = 75.9; W= 8.1; 1 = 7.1; V = 3.4; 
L = 2.5; Y= 1.7;S = <1;T = <1 

Y = 84.0; H = 6.4; V = 4.2; L = 2.0; 
F= 1.2;I = <1;M = <1;N = <1;R = <1 
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15 



20 



25 



30 



35 



40 



26 
27 

28 

29 

30 

31 
32 

33 

34 

35 

36 

37 
38 
39 
40 



* = 99.4;H = <1;T = <1 

A = 52.4; T = 43.9; V = 1.8; * = <1; 
Q = <1;S = <1;Y = <1 

T = 88.4; I = 3.7; R = 2.7; A = 2.4; 
Q - 1.2; K = <1; M - <1, P = <1; Y = <1 

G 76.8; N = 4.9; E = 4.9; T = 2.7; 
H = 2.1; K= 1.8; R = 1.8; D = 1.2; A = <1; I = <U P = <1; Q = <1 

* =<1 



E = 30.8;D = 
* = 7.6;Q = 
A = <1;I = 

1 = 87.5; F = 
K = <1;M 

1 = 77.7; V = 
Q=1.2;R 
G = <1;K = 

G = 96.6; E = 
N = <1;R = 

D = 84.2;N : 
M = <1;R 

I = 92.1;M = 
L = <1;T = 



= 23.8; R = 9.82; K = 8.2; 
= 7.9;N = 3.4;G = 2.7; 
<1;S = <1;T = <1 

= 7.6; V = 2.7;L = <1; 
= <1;R = <1 

= 8.8; T = 6.1;* = 87.5; 
= <2; A= <1;E = <1; 
= <1;L = <1;M = <1 

■ 1.2; A = <1;K = <1; 
= <1;S = <1 

= 16.4;G = <1;1 = <1; 
= <1;T = <1 

= 4.6;F = 2.1;E = <1; 



R = 96.9;*=<1;E = <1;G = <1; 
K = <1; S = <1;C = <1 

Q = 82.6;K=11.9;R = 3.5 
A= 100 

H = 90.6; Y = 4.5; R = 3.3; Q = 1.6 
C = 99.2; Y = <1 
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(* represents an amino acid gap) 

5 From the frequency of occurrence data calculated in Table 3, we have determined 

the most commonly occurring amino acid types at each position which are collectively 
represented in at least 90% of the variants analyzed. The corresponding degenerate 
codon was selected for each of the amino acid positions and used to determine a 
degenerate oligonucleotide sequence, which includes the codons for 10 amino acid 

io residues flanking the V3 loop sequences on either side, represented by the general 
sequence: 

5'- GTTCAGCTGAACGAATCTGTTGACATCAACTGCAYCCGT 
CCGARCAACAACACCARAARARGMATCMVCATSGGCCCG 

15 GGC ARAG YTWTC Y ACRCT A YC SRG SRMWTCRYTGGTRAC 

ATCCGTMAGGCTCACTGCAACATCTCTCGTGCTAAATGG 

AACAACACT -3' (Seq ID No. 6) 

This degenerate nucleotide code is for polypeptide antigens represented by the general 
20 sequence: 

Val Gin Leu Asn Glu Ser Val Glu He Asn Cys Xaai Arg Pro Xaa 2 Asn 
Asn Thr Xaa 3 Xaa 4 Xaa 5 lie Xaa 6 Xaa 7 Gly Pro Gly Xaa 8 Xaao Xaaio 
Xaa n Xaan Xaan Xaa 14 Xaa 15 Xaai 6 Xaan Gly Xaa 18 He Arg 
25 Xaai 9 Ala His Cys Asn He Ser Arg Ala Lys Trp Asn Asn Thr 

(Seq ID No. 7) 

where Xaai is selected from the group consisting of Thr and He; 

Xaa2 is selected from the group consisting of Asn and Ser; 
30 Xaa 3 is selected from the group consisting of Arg and Lys; 

Xaa 4 is selected from the group consisting of Arg and Lys; 

Xaa5 is selected from the group consisting of Arg, Gly and Ser; 

Xaa 6 is selected from the group consisting of Asn, Pro, Ser, Arg, Thr and His; 

Xaa7 is selected from the group consisting of Met and He; 
35 Xaag is selected from the group consisting of Lys and Arg; 

Xaaq is selected from the group consisting of Val and Ala; 

Xaaio is selected from the group consisting of lie and Phe; 

Xaai i is selected from the group consisting of His and Tyr; 

Xaai 2 is selected from the group consisting of Ala and Thr; 



JNSDOCID: <WG\ __94O0151A1.l_> 



WO 94/00151 



- 25 - 



PCT/US93/05899 



Xaan is selected from the group consisting of He and Thr; 
XaaH is selected from the group consisting of Arg, Asn, Glu and Gly; 
Xaa 15 is selected from the group consisting of Gly, Arg, His, Gin, Asp and Glu; 
Xaai6 is selected from the group consisting of Phe and He; 
5 Xaan is selected from the group consisting of Ala, Thr, Val and He; 

Xaai g is selected from the group consisting of Asn and Asp ; 
Xaai9 is selected from the group consisting of Lys and Gin. 

As described above, the degenerate oligonucleotide can be enzymatically ligated 
,o with the appropriate DNA sequences to create a gene library which codes for proteins 
comprising the polypeptide antigen set. 

Likewise, the most commonly occurring amino acid types, at each position, 
which are collectively represented in at least 80% of the variants were analyzed In this 
instance, the degenerate oligonucleotide sequence determined is represented by the 
15 general sequence: 

5--GTT CAG CTG AAC GAA TCT GTT GAG ATC AAC TGC Z£Q CGT 
CCG AAC AAC AAC ACC CGT ARA RGC ATC MVC ATS GGC CCG 
GCC CGT GCT WTC TAC RCT ACC GRG SRMATC RTT GGT GAC 
ATC CGT CAG GCT CAC TGC AAC ATC TCT CGT GCT AAA TGG 
AAC AAC ACT -3' (Seq ID No. 8) 

and corresponds to the polypeptide antigen set represented by the following general 
sequence: 



20 



25 



30 



35 



Val Gin Leu Asn Glu Ser Val Glu He Asn Cys Thr Arg Pro Asn Asn Asn 
Thr Arg Xaaj Xaa 2 He Xaa 3 Xaa 4 Gly Pro Gly Arg Ala Xaa 5 Tyr Xaa* 
Thr Xaa 7 Xaag He Xaao Gly Asp He Arg Gin Ala His Cys Asn He Ser Arg 
Ala Lys Trp Asn Asn Thr (Seq ID No. 9) 

where Xaai is selected from the group consisting of Lys and Arg; 
Xaa2 is selected from the group consisting of Ser and Gly; 
Xaa 3 is selected from the group consisting of His, Arg, Pro, Thr and Asn; 
Xaa4 is selected from the group consisting of He and Met; 
Xaa 5 is selected from the group consisting of Phe and He; 
Xaa6 is selected from the group consisting of Thr and Ala; 
Xaa 7 is selected from the group consisting of Gly and Glu; 
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Xaag is selected from the group consisting of Glu, Asp, Arg, Lys and Gin; 
Xaa9 is selected from the group consisting of He and Val. 

When the population of HIV- 1 V3 loop variants analyzed was selected so as to 
5 include sequences from HIV-1 isolates of Ugandan origin (Oram fit aL (1991) AIDS 
Ressargb ™d Hym^ Retroviruses 2:605, incorporated by reference herein), the 
following degenerate oligonucleotide sequence was determined; 

5 • - AGCTGAACGAATCTGTTGACATCAACTGCWCCCTCCGWACAAMA 
10 sAYCAKAMAGRGMMTSMVCMTGGCCCGGGCMRAGYTDTSYWCAC 
ACCRRAADAAYCGGCKACATCSGTCAGGCTYACTGCAACATCTC 
CGTGCTAAATGGAACAACACT -3' (Seq ID No . 10) 

This set codes for following polypeptide antigen set given by the general formula: 

Val Gin Leu Asn Glu Ser Val Glu He Asn Cys Xa &1 Arg Pro Xaa 2 Xaa 3 Xaa 4 
Xaa 5 Xaa6 Xaa 7 Xaag Xaaq Xaaio Xaan Gly Pro Gly Xaa 12 Xaa 13 Xaa 14 
Xaais Thr Thr Xaai 6 Xaan Xaa 18 Gly Xaai 9 lie Xaa 20 Gin Ala Xaa 2 l Cys 
Asn He Ser Arg Ala Lys Trp Asn Asn Thr (Seq ID No. 1 1) 

20 

where Xaai is selected from the group consisting of Thr and Ser; 
Xaa 2 is selected from the group consisting of Asn and Tyr; 
Xaa 3 is selected from the group consisting of Asn and Lys; 
Xaa4 is selected from the group consisting of Asn and Lys; 
25 Xaa 5 is selected from the group consisting of Thr and He; 

Xaa6 is selected from the group consisting of Arg and He; 
Xaa7 is selected from the group consisting of Lys and Gin; 
Xaa 8 is selected from the group consisting of Ser, Gly and Arg; 
Xaaq is selected from the group consisting of He, Met and Leu; 
30 Xaaio is selected from the group consisting of His, Asn, Arg, Ser, Pro and Thr; 

Xaa 1 1 is selected from the group consisting of He, Met, and Leu; 

Xaa i2 is selected from the group consisting of Arg, Lys, and Gin; 

Xaai 3 is selected from the group consisting of Ala and Val; 

Xaan is selected from the group consisting of Phe, Leu, He, Met and Val; 
35 Xaai 5 is selected from the group consisting of Tyr, Phe, His and Leu; 

Xaai 6 is selected from the group consisting of Gly, Lys, Arg and Glu; 

Xaai 7 is selected from the group consisting of He, Lys and Arg; 

Xaai 8 is selected *Vom the group consisting of He and Thr; 
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Xaai9 is selected from the group consisting of Asp and Tyr; 
Xaa20 is selected from the group consisting of Arg and Gly; 
Xaa21 is selected from the group consisting of His and Tyr. 

5 To synthesize this degenerate oligonucleotide sequence, the following synthetic 

oligonucleotides were produced. 

#CR1 1 5 '-CGCGAATTCTCCATGGTTC AGCTG AAC G AATCTGTTG AC3 ' 
10 (Seq ID No. 12) 

#CR22 5'-CGGACGGGWGCAGTTGATGTCAACAGATTCGTTCA 
(Seq ID No. 13) 

15 #920465 5 '-ATCAACTGCWCCCGTCCGWACAAMAASAYCAKAMAGRGMM 
TSMVCMTSGGCCCGG 
(Seq ID No. 14) 

#920466 5'-ATGTTGCAGTRAGCCTGACSGATGTMGCCGRTTHTTYYGG 
TAGTGWRSAHARCTYKACCCGGGCCSAKG 

(Seq ID No. 15) 

#CR23 5'-CAGGCTYACTGCAACATCTCTCGTGCTAAATGGAACA 
(Seq ID No. 16) 

#CR1 2 5'-CGCGTCG ACAGTGTTGTTCCATTTAGC ACGAGAG3' 
(Seq ID No. 17) 

These oligonucleotides were assembled as follows, 
CRll 920465 CR23 



20 



25 



30 



-3' 5 



3 ci2T" 920466 CR12 

Stock solutions of each of the above oligonucleotides were made by dissolving 
the oligonucleotide in sterile water. Aliquots of the stock solutions were kinased, then 
mixed together for annealing in klenow buffer and sterile water. The reaction mixture 
40 was heated to 94°C and slowly cooled at 1°C per 1 5 seconds. To the reaction mixture 
was then added dGTP, dCTP, dTTP, dATP, klenow, ATP and ligase. The mixture was 
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incubated at room temperature over night. The mixture was then precipitated with 95% 
ethanol and the DNA pellet was washed with 70% ethanol, dried and d.ssolved in 20 

microliters of sterile water. j^un-c 
The isolated DNA sequences were then PCR amplified using CR11 and CR12 as 
5 the 5' and 3' amplimers respectively. The 5' amplimer has EcoRI, Ncol and PvuII sues 
and the 3' amplimer has a Sail site which allows for cloning into several expression 

systems.^ ^ products were isolate d upon gel electrophoresis, cleaned with "Gene 
clean" (BiolOl) and cut with EcoRI and Sail restriction enzymes. The restricted DNA 

10 library was then cloned in pFLAG plasmid (IBI FLAG Biosystem, Catalogue Number: 
IB 1 3000) treated with EcoRI and Sail and with calf intestinal phosphatase. The library 
of vectors so produced codes for an in-frame fusion of FLAG peptide gene and V3 gene 
variants. Upon induction with IFTG, a library of fusion polypeptides composed of the 
FLAG peptide (amino-terminus) and V3 loop variants (carboxy-terminus) was produced^ 
The PCR products (V3 loop gene library) can also be cut with Pvu I and Sa l ^and 
ligated into either pEZZmplS or P EBBmpl8 expression system (Stahl 1 si aL (1989) 
I l mmmeL ^ incorporated by reference herein) to create a library of fusion 

proteins comprising a staphylococcal protein and the V3 sequences. All three plasnuds 
contain the P BR322 ori to drive replication in the appropriate host organism and Fl on 

20 to facilitate site directed mutagenesis. cqIt nnH 

For example, the PCR products generated above were cut with Ncol and Sail and 
ligated to the double-stranded oligonucleotide: 

5 .- AATTCCGACGACGATGACAAATC -3' (Seq ID No. 18) 

25 3 ,_ QGCTGCTGCTACTGTTTAGGTAC -5' (Seq ID No. 19) 

which encodes an enterokinase cleavage recognition (EKCR) sequence in frame with the 

V3 loop coding sequence. 

The resulting EKCR/V3 loop fusion gene was than ligated into the EcoRI and 
Sail sites of P EZZ-18 (Pharmacia Catalog No. 27-4810-01) using the EcoRI and Sail 
overhangs created by treating the EKCR/V3 fusion gene with the corresponding 
restriction endonucleases. The P EZZ-18 vector contains the protein A signa sequence 
and two synthetic "Z» domains which are based on the »B" IgG binding domain of 
protein A. This construct allows "ZZ" fusion proteins to be secreted from and to 

have increased solubility in aqueous environments, as well as facilitate affinity 
purification of the fusion protein on IgG columns. Thus, the resulting fiision gene 
encodes a ZZ/EKCR/V3 loop fusion protein. The protein A sequences can be removed 
from V3 by treatment of the resulting fusion protein wilh enterokinase. See Su et al. 



30 



35 
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15 



20 



25 



(1992) aifll££hniauss 11:756; and Forsberg et al. (1992) T Protein Chem 11:201, 
incorporated by reference herein. 

The pEZZ-18 constructs were generated and used to transform competent cells. 
The resulting fusion proteins were then purified. Briefly, cells carrying the 
ZZ/EKCR/V3 construct were grown in 2xYT medium. The cells were harvested at late 
stationary phase of growth and resuspended in sonication buffer (20mM Na Acetate, pH 
5 5 lmM PMSF, 5mM CHAPS, 10% Glycerol, 2ug/mL aprotinin). After son.caiion, 
the'lysate was centrifuged at 10,000 rpm in a Beckman JA-17 rotor for 15 mmutes. The 
supernatant was then subjected to affinity purification column comprising IgG, 
according to the manufacture's protocols. The affinity purified V3 loop was subjected to 
further purification by PAGE and electroelution 

The set of polypeptide antigens can be used to raise polyclonal sera m rabbits by 
standard immunization procedures, and a polyclonal antibody mixture purified HIV 
infectivity and gpl20/CD4 binding assays can be used to test the effectiveness of the set 
of polypeptide antigen in eliciting an immune response (e.g., antibody response) against 

variant of HIV. . , 

The polyclonal sera can also be used to artificially apply selective mutational 
pressure on the virus, in order to compress the evolutionary timetable. For instance, 
HIV-infected cells which are able to mutate in a manner which allows the new variant to 
escape recognition by the polyclonal sera can be scored for. These variants will be 
sequenced, and incorporated into a population analysis in a weighted manner so as to be 
included in a subsequent set of polypeptide antigens. 

Fquivalents 

Those skilled in the art will recognize, or be able to ascertain using no more than 
routine experimentation, numerous equivalents to the specific procedures described 
herein. Such equivalents are considered to be within the scope of this invention and are 
covered by the following claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: CREA, ROBERTO 
(ii ) TITLE OF INVENTION : COMBINATORIAL POLYPEPTIDE ANTIGENS 

(iii) NUMBER OF SEQUENCES: 19 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: LAHIVE & COCKFIELD 

(B) STREET: 60 STATE STREET , SUITE 510 

(C) CITY: BOSTON 

(D) STATE: MA 

(E) COUNTRY: USA 

(F) ZIP: 02109 

20 ( V ) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: ASCII TEXT 

2j (vi ) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 
<B) FILING DATE: 18-JUN-1993 
(C) CLASSIFICATION : 

30 

(vii) PRIOR APPLICATION DATA: . Qnn .„ 

(A) APPLICATION NUMBER: US 07/900,123 

(B) FILING DATE: 18-JUN-1992 

35 (viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: DeConti, Giulio A. 

(B) REGISTRATION NUMBER: 31,503 

(C) REFERENCE / DOCKET NUMBER: CTE-003PC 

4ft (ix) TELECOMMUNICATION INFORMATION: 

40 {A) TELEPHONE: (617 )._ 227-7400 

(B) TELEFAX: (617) 227-5941 



45 (2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 amino acids 

(B) TYPE: amino acid 
50 (D) TOPOLOGY: linear 

(ii ) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

55 (xi) SEQUENCE DESCRIPTION: SEQ ID 
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20 



25 



Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Ser He His He Gly Pro 
! 5 10 15 

Gly Arg Ala Leu Tyr Thr Thr Gly Glu He He Gly Asp He Arg Gin 

25 30 



Ala His Cys 
35 



(2) INFORMATION FOR SEQ ID NO : 2 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 amino acids 

(B) TYPE: amino acid 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

( v ) FRAGMENT TYPE: internal 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 2 : 



40 



45 



50 



55 



Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Lys He Arg He Gin Arg 

Gly Pro Gly Arg Ala Phe Val Thr lie Gly Lys He Gly Asn Met Arg 
on 25 30 



Gin Ala His Cys 
30 35 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 3 5 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Ser He Thr He Gly Pro 
x 5 10 15 

Gly Arg Ala Phe Tyr Ala Thr Gly Asp He He Gly Asp He Arg Gin 



20 



Ala His Cys 
35 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS : 
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15 



20 



30 



(A) LENGTH: 34 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE : internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Lys He Arg He Gly Pro 
! 5 10 15 

Gly Arg Ala Phe Val Thr He Gly Lys He Gly Asn Met Arg Gin Ala 

25 30 



20 

His Cys 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 35 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(v) FRAGMENT TYPE: internal 



35 ( X i) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Ser He His He Gly Pro 
x 5 10 15 

40 Gly Arg Ala Phe Tyr Thr Thr Gly Glu He He Gly Asp He Arg Gin 

20 25 30 



45 



55 



Ala His Cys 
35 



(2) INFORMATION FOR SEQ ID NO : 6 ; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 5 base pairs 
50 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
GTTCAGCTGA ACGAATCTGT TGACATCAAC TGCAYCCGTC CGARCAACAA CACCARAARA 
5 RGMATCMVCA TSGGCCCGGG CARAGYTWTC YACRCTAYCS RGSRMWTCRY TGGTRACATC 

CGTMAGGCTC ACTG CAACAT CTCTCGTGCT AAATGGAACA ACACT 
10 (2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 55 amino acids 

(B) TYPE: amino acid 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

20 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 

(B) LOCATION: 12 

25 (D ) OTHER INFORMATION: /note- "Xaa is Thr or lie 

(ix) FEATURE: 

(A) NAME /KEY: Modif ied-site 

(B) LOCATION: 15 

(D) OTHER INFORMATION: /note= "Xaa is Asn or Ser 



30 



35 



40 



45 



50 



60 
12 0 
16 5 



(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 19 

(D) OTHER INFORMATION: /note- "Xaa is Arg or Lys 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 2 0 

(D) OTHER INFORMATION : /note- "Xaa is Arg or Lys 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 21 

(D) OTHER INFORMATION : /note- "Xaa is Arg, Gly or Ser 

(ix) FEATURE: 

(A) NAME/KEY: Modified- site 

Si SST3ic£*TI«: /not- "Xaa is Asn. Pro, Ser, Arg. Thr or 

His" 



(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 
55 (B) LOCATION: 24 

(D) OTHER INFORMATION: /note- "Xaa is Met or lie 
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15 



20 



25 



30 



35 



40 



(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 

(B) LOCATION: 28 

(D) OTHER INFORMATION: /note- "Xaa is Lys or Arg 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied- site 

(B) LOCATION: 29 

(D) OTHE- INFORMATION: /note- "Xaa is Val or Ala 

(ix) FEATURE : , 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 3 0 Tlci — p he " 
(D) OTHER INFORMATION: /note- "Xaa is He or Pne 

(ix) FEATURE: 

(A) NAME / KEY : Modif ied- s ite 

(B) LOCATION: 31 

(D) OTHER INFORMATION : /note- "Xaa is His or Tyr 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied- site 

(B) LOCATION: 32 ^ ^ Thr" 
(D) OTHER INFORMATION: /note- "Xaa is Ala or Thr 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 33 

(D) OTHER INFORMATION: /note, "Xaa is He or Thr 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied- site 

(B) LOCATION: 34 Glv" 
(D) OTHER INFORMATION: /note- "Xaa is Arg, Asn, Glu or Gly 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 35 . Gl Asv or 
D) OTHER INFORMATION: /note- "Xaa is Gly, Arg, His, Gin, Asp 

Glu" 



(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 
SO (B) LOCATION: 37 xl , 

50 J D) OTHER INFORMATION: /note- "Xaa is Ala, Thr, Val or lie 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

55 SJ SJSr^NFO^ATION: /note- "Xaa is Asn or Asp" 
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15 
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25 



30 



35 



40 



45 



50 



(ix) FEATURE: 

(A) NAME/KEY: Modified- site 

(B) LOCATION: 4 2 T fe nr rln „ 
(D) OTHER INFORMATION: /note. "Xaa is Lys or Gin 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Val Gin ,eu As*, Glu Ser Val Glu Xle Asn Cys Xaa Arg Pro Xaa Asn 
1 5 

As » Thr Xaa Xaa Xaa Xle Xaa Xaa Gly Pro Gly Xaa Xaa Xaa Xaa Xaa 

20 25 

m„ y aa He Arq Xaa Ala His Cys Asn He Ser 
Xaa Xaa Xaa Xaa Xaa Gly Xaa lie Arg 

40 

35 4U 

Arg Ala Lys Trp Asn Asn Thr 
50 55 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
GTTCAGCTGA ACGAATCTGT TGACATCAAC TGCACCCGTC CGAACAACAA CACCCGTARA 
RGCATCMVCA TSGGCCCGGC CCGTGC^TC TACRCTACCG RGSRMATCRT TGGTGACATC 
CGTCAGGCTC ACTGCAACAT CTCTCGTGCT AAATGGAACA ACACT 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE : internal 



60 
120 
165 



(ix) FEATURE: 

(A) NAME/KEY: Modified- site 

(B) LOCATION: 20 

55 (D) OTHER INFORMATION: /note- "Xaa is Lys or Arg 
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(ix) FEATURE : 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 21 

(D) OTHER INFORMATION: /note. "Xaa is Ser or Gly 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

( ( S SSS^««« ,,xaa is His ' pro ' Thr or Asn " 

(ix) FEATURE: 

(A) NAME/KEY: Modified- site 

(B) LOCATION: 24 

(D) OTHER INFORMATION: /note- "Xaa is He or Met 

(ix) FEATURE: 

(A) NAME / KEY : Modif ied- site 

(B) LOCATION: 30 

(D) OTHER INFORMATION: /note- "Xaa is Phe or He 

"(ix) FEATURE: 

..(A) NAME /KEY : Modif ied-site 
(B) LOCATION: 32 

(D) OTHER INFORMATION: /note- "Xaa is Tre or Ala 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 

(B) LOCATION: 34 

(D) OTHER INFORMATION: /note- "Xaa is Gly or Glu 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

£i oSS^SfO^TION: /note= "Xaa is Glu, Asp, Arg, Lys or Gin" 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 37 

(D) OTHER INFORMATION: /note- "Xaa xs He or Val 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

val Gin Leu Asn Glu Ser Val Glu lie Asn Cys Thr Arg Pro Asn Asn 

1 5 10 

Asn Thr Arg Xaa Xaa He Xaa xaa Gly Pro Gly Arg Ala Xaa Tyr Xaa 
20 25 

Thr Xaa Xaa lie Xaa Gly Asp He Arg Gin Ala His Cys Asn He Ser 
_ _ a n 4 3 



35 40 



Arg Ala Lys Trp Asn Asn Thr 
50 55 

(2) INFORMATION FOR SEQ ID NO: 10: 
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<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GTTCAGCTGA ACGAATCTGT TGACATCAAC TGCWCCCGTC CGWACAAMAA SAYCAKAMAG 60 
15 RGMMTSMVCM TSGGCCCGGG CMRAGYTDTS YWCACTACCR RAADAAYCGG CKACATCSGT 

162 



CAGGCTYACT GCAACATCTC TCGTGCTAAA TGGAACAACA CT 
(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 
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55 



(ix) FEATURE: 

(A) NAME /KEY : Modified- site 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /note- "Xaa is Thr or Ser 

-(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 15 

(D) OTHER INFORMATION: /note- "Xaa is Asn or Tyr 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied- site 

(B) LOCATION: 16 

(D) OTHER INFORMATION : /note- "Xaa is Asn or Lys 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 17 

(D) OTHER INFORMATION: /note- "Xaa is Asn or Lys 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied- site 

(B) LOCATION: 18 

(D) OTHER INFORMATION: /note- "Xaa is Thr or lie 
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(ix) FEATURE: 

(A) NAME / KEY : Modif ied-site 

(B) LOCATION: 19 

(D) OTHER INFORMATION : /note- "Xaa is Arg or lie 

(ix) FEATURE: 

(A) NAME/KEY: Modified- site 

(B) LOCATION: 2 0 

(D) OTHER INFORMATION: /note- "Xaa xs Lys or Gin 

(ix) FEATURE: _ 

(A) NAME /KEY : Modif ied-site 

(B) LOCATION: 21 

(D) OTHER INFORMATION: /note- "Xaa is Ser, Gly or Arg 

(ix) FEATURE: 

(A) NAME / KEY : Modif ied-site 

(B) LOCATION: 22 T « 
(D) OTHER INFORMATION : /note- "Xaa is lie, Met or Leu 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

(B) LOCATION: 23 

(D) OTHER INFORMATION: /note- "Xaa is His, Asn, Arg, Ser, Pro or 

Thr" 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 

(B) LOCATION: 24 

(D) OTHER INFORMATION : /note- "Xaa is He, Met or Leu 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 

(B) LOCATION: 28 r1n „ 
(D) OTHER INFORMATION: /note- "Xaa is Arg, Lys or Gin 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 2 9 

(D) OTHER INFORMATION: /note- "Xaa is Ala or Val 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 3 0 , , 
(D) OTHER INFORMATION: /note- "Xaa is Pre, Leu, He, Met or Val 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 31 _ „. e „ T _ n „ 
(D) OTHER INFORMATION: /note- "Xaa is Tyr, Phe , His or Leu 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- site 

IS o?S"?NFO^ATION: /not- "Xaa is Gly, Lys, Arg or Glu» 
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(ix) FEATURE: 

(A) NAME /KEY : Modified- site 

(B) LOCATION: 35 T . Tvq „ Ara - 
(D) OTHER INFORMATION: /note- "Xaa is He, Lys or Arg 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 36 

(D) OTHER INFORMATION: /note- "Xaa is He or Thr 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 38 

<D> OTHER INFORMATION: /note- "Xaa is Asp or Tyr 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied- Site 

(B) LOCATION: 40 

(D) OTHER INFORMATION: /note- "Xaa is Arg or Gly 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 43 

(D) OTHER INFORMATION: /note- "Xaa is His or Tyr 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Val Gin Leu Asn Glu Ser Val Glu He Asn Cys Xaa Arg Pro Xaa Xaa 
30 1 5 

y 33 xaa Glv Pro Gly Xaa Xaa Xaa Xaa Thr 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa fciy y 



20 25 



35 Thr Xaa Xaa Xaa Gly Xaa He Xaa Gin Ala Xaa Cys Asn He Ser Arg 

35 40 

Ala Lys Trp Asn Asn Thr 



50 

(2) INFORMATION FOR SEQ ID NO : 1-2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 
55 CGCGAATTCT CCATGGTTCA GCTGAACGAA TCTGTTGAC 
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(2) INFORMATION FOR SEQ ID NO: 13: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGGACGGGWG CAGTTGATGT CAACAGATTC GTTCA 
15 (2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



10 
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(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 14: 
ATCAACTGCW CCCGTCCGWA CAAMAASAYC AKAMAGRGMM TSMVCMTSGG CCCGG 
(2) INFORMATION FOR SEQ ID NO : 15 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: €9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



55 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
45 ATGTTGCAGT RAGCCTGACS GATGTMGCCG RTTHTTYYGG TAGTGWRSAH ARCTYKACCC 

69 

GGGCCSAKG 



50 



(2) INFORMATION FOR SEQ ID NO : 16 : 



{i ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
55 (d) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
5 CAGGCTYACT GCAACATCTC TCGTGCTAAA TGGAACA 
(2) INFORMATION FOR SEQ ID NO: XI: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 34 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: cDNA 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
CGCGTCGACA GTGTTGTTCC ATTTAGCACG AGAG 

(2) INFORMATION FOR SEQ ID NO: 18: 

25 (i ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
-(C) STRANDEDNESS: single 

( d ) TOPOLOGY : 1 inear 



30 



40 
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(ii) MOLECULE TYPE: CDNA 



35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 18 ; 

AATTCCGACG ACGATGACAA ATC 



50 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
•;d> TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
GGCTGCTGCT ACTGTTTAGG TAC 
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Claims 
1. 



A method of producing a set of polypeptide antigens having amino acid 
sequences derived from a population of variants of a protein or a portion thereof, 



5 comprising the steps of: 

a. selecting a protein, or a portion thereof, which exhibits a population of N 
variants, represented by the formula 

Ai A 2 A3 .... A n _2 A n _i A n , 
where A n is an amino acid occuring at amino acid position n of the p otern, 

10 or portion thereof; 

b. determining the number of times CC «ch type of amino acid occurs at 
each amino acid position n in the N variants; 

c. calculating the frequency of occurrence (O a n a / N )n of each type of amino 
acid at each amino acid position n in the N variants; and 

15 d. generating a set of polypeptide antigens having amino acid sequences 

represented substantially by the formula; and 

A'l A'2 A' 3 .... A n -2 A' n -1 A' n 
where A' n is defined as an amino acid type which occurs at greater than a 
selected frequency at the corresponding amino acid position in the N 

20 variants. 

2. A method according to claim 1, wherein the set of polypeptide antigens range 
from 8 to 150 amino acids. 

25 3. A method according to claim 1, wherein the selected frequency is 5%. 

4. A method according to claim 1, wherein the selected frequency is 10%. 

5. A method according to claim 1, wherein the protein is an immunogenic 
30 component of a pathogen. 

6. A method according to claim 5, wherein the immunogenic component of the 
pathogen contains one or more neutralizing antigenic epitopes. 

35 7 . A method according to claim 5, wherein the protein is a component of a virus. 

8. A method according to claim 7, wherein the protein is a component of HIV-1 . 
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9. A method according to claim 8, wherein the protein is gpl20 or a portion thereof. 

10. A method according to claim 9, wherein the portion of a protein is the V3 loop of 
5 gpl20. 

11. A method according to claim 1, wherein the set of polypeptide antigens is 

gene T dby synthesizing a degenerate oligonucleotide sequence having a 
minimum number of nucleotide combinations at each codon position n 
which combinations include at least the codons coding for each type of 
amino acid A'l to A' n ; and 

ii) expressing the degenerate oligonucleotide in an expression system 
to produce the set of polypeptide antigens: 
15 A'i A 2 A3 .... A n -2 A' n -1 A' n 

12. A set of polypeptide antigens produced by the method of claim 1 . 

13. A vaccine composition comprising the set of polypeptide antigens of claim 12 
20 and a pharmaceutical^ acceptable vehicle. 

14 A set of polypeptide antigens derived from the V3 loop of a population of HIV-1 
" variants, the set of polypeptide antigens represented by the amino acid sequence: 

Val Gin Leu Asn Glu Ser Val Glu He Asn Cys Xa ai Arg Pro Xaa 2 Xaa 3 Xaa 4 
Xaa 5 Xaa 6 Xaa 7 Xaag Xaa 9 Xaa 10 Xaa U Gly Pro Gly Xaa 12 Xaa 13 Xaa 14 
Xaa'5 Thr Thr Xaa 16 Xaa 17 Xaa 18 Gly Xaa 19 He Xaa 20 Gin Ala Xaa 21 Cys 
Asn He Ser Arg Ala Lys Trp Asn Asn Thr (Seq ID No. 1 1) 

30 where Xaai is selected from the group consisting of Thr and Ser; 

Xaa 2 is selected from the group consisting of Asn and Tyr; 

Xaa 3 is selected from the group consisting of Asn and Lys; 

Xaa 4 is selected from the group consisting of Asn and Lys; 

Xaa 5 is selected from the group consisting of Thr and He; 
35 Xaa 6 is selected from the group consisting of Arg and He; 

Xaa 7 is selected from the group consisting of Lys and Gin; 

Xaag is selected from the group consisting of Ser, Gly and Arg; 
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Xaao is selected from the group consisting of He, Met and Leu; 

Xaa 10 is selected from the group consisting of His, Asn, Arg, Ser, Pro and Thr; 

Xaai i is selected from the group consisting of He, Met, and Leu; 

Xaan is selected from the group consisting of Arg, Lys, and Gin; 
5 Xaan is selected from the group consisting of Ala and Val; 

Xaan is selected from the group consisting of Phe, Leu, He, Met and Val; 

Xaai 5 is selected from the group consisting of Tyr, Phe, His and Leu; 

Xaai6 is selected from the group consisting of Gly, Lys, Arg and Glu; 

Xaan is selected from the group consisting of lie, Lys and Arg; 
io Xaaig is selected from the group consisting of He and Thr; 

Xaai 9 is selected from the group consisting of Asp and Tyr; 

Xaa 2 o is selected from the group consisting of Arg and Gly; 

Xaa 2 l is selected from the group consisting of His and Tyr. 

15 15. A vaccine composition comprising the set of polypeptide antigens of claim 14 
and a physiological vehicle. 

16. A set of degenerate oligonucleotides represented substantially by the sequence, 

5'- GTTCAGCTGAACGAATCTGTTGACATCAACTGCAYCCGTCCGW 
ACAAMAASAYCAKAMAGRGMMTSMVCMTSGGCCCGGGCMRAGY 
TDTSYWCACTACCRRAADAAYCGGCKACATCSGTCAGGCTYAC 
TGC AAC ATCTCTCGTGCT AAATGG AAC AAC ACT - 3 ' . (SeqIDNo. 10) 

A library rf expression vectors containing the oligonucleotide of claim 16. 

A set of polypeptide antigens derived from the V3 loop of a population of HTV-1 
variants, the set of polypeptide antigens represented by the general sequence: 

Val Gin Leu Asn Glu Ser Val Glu lie Asn Cys Xaai Arg Pro Xaa 2 Asn Asn Thr 
Xaa 3 X ? .a 4 Xaa 5 He Xaa 6 Xaa 7 Gly Pro Gly Xaa 8 Xaao Xaaio Xaan Xaai 2 
Xaa'3 Xaai 4 Xaa 15 Xaai 6 Xaa 17 Gly Xaaig He Arg Xaa 19 Ala His Cys Asn 
He Ser Arg Ala Lys Trp Asn Asn Thr (Seq ID No. 7) 

where Xaai is selected from the group consisting of Thr and He; 
Xaa 2 is selected from the group consisting of Asn and Ser; 
Xaa 3 is selected from the group consisting of Arg and Lys; 



25 17. 

18. 

30 



35 
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Xaa 4 is selected from the group consisting of Arg and Lys; 
Xaa 5 is selected from the group consisting of Arg, Gly and Ser; 
Xaae is selected from the group consisting of Asn, Pro, Ser, Arg, Thr and His; 
Xaa 7 is selected from the group consisting of Met and He; 
5 Xaag is selected from the group consisting of Lys and Arg; 

Xaao is selected from the group consisting of Val and Ala; 
Xaaio is selected from the group consisting of He and Phe; 
Xaai i is selected from the group consisting of His and Thr; 
Xaan is selected from the group consisting of Ala and Thr; 
io Xaan is selected from the group consisting of He and The; 

Xaa 4 is selected from the group consisting of Arg, Asn, His, Gin, Asp and Glu; 

Xaais is selected from the group consisting of Gly, Arg, His, Gin, Asp and Glu; 

Xaai6 is selected from the group consisting of Phe and lie; 

Xaa 17 is selected from the group consisting of Ala, Thr, Val and He; 
15 Xaai g is selected from the group consisting of Asn and Asp; 

Xaaio is selected from the group consisting of Lys and Gin. 

19. A vaccine composition comprising the set of polypeptide antigens of claim 18 
and a physiological vehicle. 

20 20. A degenerate oligonucleotide sequence represented substantially by the sequence, 

5'- GTTCAGCTGAACGAATCTGTTGACATCAACTGCAYCCGT 
CCGARCAACAACACCARAARARGMATCMVCATSGGCCCG 
GGCARAGYTWTCYACRCTAYCSRGSRMWTCRYTGGTRAC 
ATCCGTMAGGCTCACTGCAACATCTCTCGTGCTAAATGG 

AACAACACT -3'. (Seq ID No. 6) 
A library of expression vectors containing the oligonucleotide of claim 20. 

A set of polypeptide antigens derived from the V3 loop of a population of fflV-1 
variants, the set of polypeptide antigens represented by the general sequence: 

Val Gin Leu Asn Glu Ser Val Glu He Asn Cys Thr Arg Pro Asn Asn Asn Thr Arg 
Xaa, Xaa 2 He Xaa 3 Xaa 4 Gly Pro Gly Arg Ala Xaa 5 Tyr Xaae Thr Xaa 7 Xaag 
He Xaao Gly Asp He Arg Gin Ala His Cys Asn He Ser Arg Ala Lys Trp Asn Asn 
Thr (SeqTT>No. 9) 



21. 

30 

22. 
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23. 



15 24. 



20 



5 



25. 



where Xaa! is selected from the group consisting of Lys and Arg; 

Xaa 2 is selected from the group consisting of Ser and Gly ; 

Xaas is selected from the group consisting of His, Arg, Pro, Thr and Asn; 

Xaa4 is selected from the group consisting of lie and Met; 

Xaa 5 is selected from the group consisting of Phe and He; 

Xaa 6 is selected from the group consisting of Thr and Ala; 

Xaa 7 is selected from the group consisting of Gly and Glu; 

Xaag is selected from the group consisting of Glu, Asp, Arg, Lys and Gin; 

Xaao is selected from the group consisting of He and Val. 

A vaccine composition comprising the set of polypeptide antigens of claim 22 
and a physiological vehicle. 

A set of degenerate oligonucleotides represented substantially by the sequence, 

• -GTT CAG CTG AAC GAA TCT GTT GAG ATC AAC TGC AC£ CGT 
CCG AAC AAC AAC ACC CGT ARA RGC ATC MVC ATS GGC CCG 
OCC CGT GCT WTC TAC RCT ACC GRG SRM ATC RTT GGT GAC 
ATC CGT CAG GCT CAC TGC AAC ATC TCT CGT GCT AAA TGG 
AAC AAC ACT -3' (Seq ID No. 8) 

A library of expression vectors containing the oligonucleotide of claim 24. 
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